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Background/Aims: The aim of this study was to analyze comorbidity in patients with type 2 diabetes mellitus (T2DM) 
by using association rule mining (ARM). 

Methods: We used data from patients who visited Keimyung University Dongsan Medical Center from 1996 to 2007. 
Of 411 ,414 total patients, T2DM was present in 20,314. The Dx Analyze Tool was developed for data cleansing and data 
mart construction, and to reveal associations of comorbidity. 

Results: Eighteen associations reached threshold (support, > 3%; confidence, > 5%). The highest association was 
found between T2DM and essential hypertension (support, 17.43%; confidence, 34.86%). Six association rules 
were found among three comorbid diseases. Among them, essential hypertension was an important node between 
T2DM and stroke (support, 4.06%; confidence, 8.12%) as well as between T2DM and dyslipidemia (support, 3.44%; 
confidence, 6.88%). 

Conclusions: Essential hypertension plays an important role in the association between T2DM and its comorbid 
diseases. The Dx Analyze Tool is practical for comorbidity studies that have an enormous clinical database. 
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INTRODUCTION 

According to national health statistics in Korea, 
the prevalence of type 2 diabetes mellitus (T2DM) in- 
creased from 8.6% in 2001 to 9.5% in 2007, while the 
prevalence of T2DM in the United States was 10.7% in 
2007. Furthermore, the prevalence of T2DM in 2007 
in men (11.6%) was higher than in women (7.8%). The 
prevalence was highest in men aged 60-69 years (26.6%) 
and in females aged 70-79 years (19.5%) [1]. 

Patients with T2DM have an increased incidence of 



disease in several internal organs and tissues. Chronic 
microvascular and macrovascular diseases have greater 
influence on the long-term prognosis of patients with 
T2DM than acute complications [2]. Investigating the 
associations of these complications with comorbid dis- 
eases by using patient diagnostic data is helpful in pre- 
dicting their incidence and thus more effectively treat- 
ing patients with T2DM. 

Association rule mining (ARM) describes how two 
items are related using a special method of exploring 
patterns different from other analysis techniques [3]. 
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The association rule generated from ARM can formu- 
late the relation between X and Y in the form of "X -» 
Y" or "If X.., then Y..," and analyze it as "If item X exists, 
item Y coexists" [4]. A rule does not necessarily imply 
cause and effect. Instead, it identifies simultaneous 
occurrence between items in antecedent X and conse- 
quent Y. ARM makes it possible to analyze the associa- 
tion between not only two diseases, but also among 
three or more comorbidities that can be calculated from 
existing statistics. One study revealed the accompany- 
ing diseases of attention deficit/hyperactivity disorder 
by applying ARM to diagnostic data from the National 
Health Insurance Database of Taiwan [5]. Another 
study analyzed stroke and its comorbid diseases by 
ARM [6]. Therefore, the current study was conducted to 
determine the relations among complications, the vari- 
ous diseases that accompany T2DM, and three or more 
comorbidities, using ARM based on large amounts of 
clinical data. 



METHODS 

Study population 

Data from 411,414 patients examined at the Keimy- 
ung University Dongsan Medical Center from 1996 to 
2007 were analyzed using the Dx Analyze Tool. Among 
the patients, 20,314 had T2DM and the total diagnostic 
data was 145,306. As the control group for the analysis, 
20,314 patients without a diagnosis of T2DM were in- 
cluded and the total diagnostic data was 57,379. 

Data collection 

The workflow of the association analysis of T2DM 
comorbid diseases is shown in Fig. 1. First, data were 
collected from the database of patients examined at 
Keimyung University Dongsan Medical Center from 
1996 to 2007. Personal information of the subjects such 
as name, gender, age, and contact details was not col- 
lected. 

Analysis method 

For the current study, we developed the Dx Analyze 
Tool using the Apriori algorithm (C# 2.0, MS Access DB) 
[4>7l to analyze the association between clinical diag- 
noses. The Dx Analyze Tool, which refines the data and 



Extraction of diagnosis data 



Analysis by using Dx Analyzer v.1.0 



Step 1 . Data saving 



Step 2. Data cleansing 



Step 3. Data mart construction 



Step 4. Select Dx code 



Step 5. Analysis by Apriori modeling 



Assessment of rules 



Find out the rules 



Figure 1. Schematic diagram of the study workflow. 



extracts an association rule between a specific disease 
and its related diseases, involves five steps: data reten- 
tion, data cleansing, data mart construction, selection 
of Dx code, and analysis by the Apriori algorithm. The 
Apriori algorithm is an ARM technique. The algorithm 
rules specify when item-set A appears and an item-set B 
appears with it. The rules are evaluated by support (the 
number of occurrences of disease A and disease B from 
all diseases) and confidence (the number of occurrences 
of disease A co-occurring with disease B). The formulas 



Support (%) : 



Confidence (%) 



Number of disease AnB 
Total number of disease 

Number of disease AnB 



Number of disease A 



for support and confidence have been previously de- 
scribed [4,8,9] and are presented below. 

Using SPSS version 18.0 (SPSS Inc., Chicago, IL, 
USA), the chi-square test was used to review the associ- 
ation rules generated by the Dx Analyze Tool and to dis- 
cern differences between groups with or without T2DM 
in the distribution of diseases appearing by the associa- 
tion rule. The results from the Dx Analyze Tool and the 
chi-square test found that a meaningful association rule 
exists between T2DM and other diseases. 
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Table 1. High frequency comorbid diseases with type 2 diabetes mellitus (n = 20,314) 



Dx code 


Dx name 


No. 


% 


110 


Essential (primary) hypertension 


7,081 


34.86 


K29 


Gastritis and duodenitis 


3,170 


15.61 


H25 


Senile cataract 


3,134 


15.43 


E78 


Disorders of lipoprotein metabolism and other lipidemias 


2,771 


13.64 


H36 


Retinal disorders in diseases classified elsewhere 


2,597 


12.78 


I63 


Cerebral infarction 


2,522 


12.42 


I20 


Angina pectoris 


2,520 


12.41 


N18 


Chronic renal failure 


1,638 


8.06 


K25 


Gastric ulcer 


1,617 


7.96 


M81 


Osteoporosis without pathological fracture 


1,464 


7.21 


I50 


Heart failure 


1,374 


6.76 


K21 


Gastroesophageal reflux disease 


1,323 


6.51 


121 


Acute myocardial infarction 


1,192 


5.87 


H35 


Other retinal disorders 


1,183 


5.82 


K76 


Other hepatic diseases 


1,152 


5.67 


G63 


Polyneuropathy in diseases classified elsewhere 


1,082 


5.33 


J15 


Bacterial pneumonia, not elsewhere classified 


1,042 


5.13 


Z03 


Medical observation and evaluation for suspected diseases and conditions 


1,025 


5.05 


K74 


Hepatic fibrosis and cirrhosis 


1,024 


5.04 



RESULTS 

Diseases frequently accompanying T2DM 

Diseases that frequently accompany T2DM are sum- 
marized in Table 1. The most frequent disease was es- 
sential hypertension (34.68% of all subjects), followed 
by gastritis and duodenitis (15.61%), senile cataract 
(15-43%), lipidemias and other disorders of lipoprotein 
metabolism (13.64%), and retinal disease (12.78%). 

Association rule resulting from the Apriori 
algorithm 

The association rule between T2DM and comorbid 
diseases generated by the Apriori algorithm is pre- 
sented in Table 2. The threshold for values was estab- 
lished as > 3% for support and > 5% for confidence, 
and 18 rules satisfying these conditions were made. 
The rule with the highest support and confidence was 
T2DM^essential hypertension (support, 17.43%; con- 
fidence, 34.86%). Other rules with high support and 
confidence were T2DM^gastritis/duodenitis (support, 
7.80%; confidence, 15.61%), T2DM^senile cataract 



(support, 7.71%; confidence, 15.43%), T2DM^disorders 
of lipoprotein metabolism and other lipidemias (support, 
6.82%; confidence, 13.64%), and T2DM^retinal disease 
(support, 6.39%; confidence, 12.78%). The rules show- 
ing an association for more than three diseases were 
T2DM^essential hypertension and stroke (support, 
4.06; confidence, 8.12%), T2DM^essential hyperten- 
sion and disorders of lipoprotein metabolism and other 
lipidemias (support, 3.44%; confidence, 6.88%), and 
T2DM^senile cataract and retinal disease (support, 
3.39%; confidence, 6.78%). 

Statistical examination of ARM analysis results 

The results of the statistical analysis to determine 
the distribution of diseases occurring with or without 
T2DM are summarized in Table 3. Subjects with T2DM 
were more likely than those without T2DM to have dis- 
orders of lipoprotein metabolism and other lipidemias, 
senile cataract, retinal disorders, essential hyperten- 
sion, angina pectoris, heart failure, cerebral infarction, 
gastroesophageal reflux disease, gastric ulcer, gastritis 
and duodenitis, osteoporosis without pathological frac- 
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Table 2. Association rules between type 2 diabetes mellitus and comorbid diseases (n = 40,628) 

Rule No. Support Confidence 

E11 -» 110 7,081 17.43 34.86 

E11^K29 3,170 7.80 15.61 

E11^H25 3,134 7.71 15.43 

E11^E78 2,771 6.82 13.64 

E11 H36 2,597 6.39 12.78 

E11 I63 2,522 6.21 12.42 

E11 I20 2,520 6.20 12.41 

E11 110, I63 1,649 4.06 8.12 

E11^N18 1,638 4.03 8.06 

E11^K25 1,617 3.98 7.96 

E11^M81 1,464 3.60 7.21 

E11^I10, E78 1,398 3.44 6.88 

E11^H25, H36 1,378 3.39 6.78 

E11 I50 1,374 3.38 6.76 

E11 110, I20 1,342 3.30 6.61 

E11^K21 1,323 3.26 6.51 

E11 110, K29 1,310 3.22 6.45 

E11 110, H25 1,263 3.11 6.22 

E11, type 2 diabetes mellitus; 110, essential (primary) hypertension; K29, gastritis and duodenitis; H25, senile cataract; E78, disorders 

of lipoprotein metabolism and other lipidemias; H36, retinal disorders in diseases classified elsewhere; I63, cerebral infarction; I20, 

angina pectoris; N18, chronic renal failure; K25, gastric ulcer; M81, osteoporosis without pathological fracture; I50, heart failure; K21, 

gastroesophageal reflux disease. 



Table 3. Statistical analysis of the association rule mining results (n = 40,628) 


Dx code 


E11 


Non E11 


x 2 


p value 


E78 


2,771 (13.6) 


533 (2.6) 


1,650.12 


0.000 


H25 


3,134 (15.4) 


380 (1.9) 


2,362.72 


0.000 


H36 


2,597 (12.8) 


21 (0.1) 


2,709.25 


0.000 


no 


7,081 (34.9) 


1,186 (5.8) 


5,277.43 


0.000 


I20 


2,520 (12.4) 


522 (2.6) 


1,418.50 


0.000 


I50 


1,374(6.8) 


257 (1.3) 


796.97 


0.000 


I63 


2,522 (12.4) 


442 (2.2) 


1,574.51 


0.000 


K21 


1,323(6.5) 


598 (2.9) 


287.20 


0.000 


K25 


1,617(8.0) 


441 (2.2) 


707.85 


0.000 


K29 


3,170 (15.6) 


1,385 (6.8) 


787.82 


0.000 


M81 


1,464 (7.2) 


201 (1.0) 


999.00 


0.000 


N18 


1,638 (8.1) 


167 (0.8) 


1,254.54 


0.000 



Values are presented as number (%). 



E11, type 2 diabetes mellitus; E78, disorders of lipoprotein metabolism and other lipidemias; H25, senile cataract; H36, retinal disorders 
in diseases classified elsewhere; 110, essential (primary) hypertension; I20, angina pectoris; I50, heart failure; I63, cerebral infarction; 
K21, gastroesophageal reflux disease; K25, gastric ulcer; K29, gastritis and duodenitis; M81, osteoporosis without pathological fracture; 
N18, chronic renal failure. 
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ture, and chronic renal failure (p < 0.05). 

DISCUSSION 

This study was conducted to analyze the association 
between T2DM and comorbid diseases. Prior to this 
study, a pilot study was performed, in which comorbid- 
ity of cerebral infarction patients [6] and essential hy- 
pertension patients [10] were analyzed by ARM. On the 
basis of the pilot study, the present study constructed 
a data mart by refining diagnostic data extracted from 
patients of our medical center. The association rule re- 
lated to more than three diseases comorbid with T2DM 
was ascertained by developing a program to generate 
the association rule by applying the ARM Apriori algo- 
rithm. 

T2DM is frequently accompanied by one or more 
components of metabolic syndrome such as obesity, 
dyslipidemia, and hypertension. A patient with hyper- 
tension is 2.4 times more likely to develop cerebro- 
vascular disease [11]. A study that examined diabetic 
complications in 5,652 patients with diabetes from 13 
university hospitals in Korea reported that hyperten- 
sion and dyslipidemia are accompanying comorbid 
conditions in 60.4% and 44.1%, respectively, of these 
patients. Additionally, 38.4% and 44.7% of patients had 
retinopathy and neuropathy, respectively [2]. Another 
study [12] reported that 77.9% of 4,240 patients with 
T2DM from 13 university hospitals in Korea had meta- 
bolic syndrome, with the prevalence of each component 
of metabolic syndrome being 56.8% for central obesity, 
42.0% for hypertriglyceridemia, 65.1% for low high- 
density lipoprotein cholesterol, and 74.9% for hyperten- 
sion. Despite different research methods, the results of 
the present study agree with previous studies and link 
T2DM with essential hypertension, disorders of lipopro- 
tein metabolism and other lipidemias, retinal disease, 
cerebral infarction, and angina pectoris. Specifically, 
T2DM and essential hypertension had the highest as- 
sociation, and this association produced the following 
association rules: T2DM^essential hypertension and 
cerebral infarction, T2DM^essential hypertension and 
disorders of lipoprotein metabolism and other lipid- 
emias, and T2DM^essential hypertension and angina 
pectoris. A previous comorbidity study on cerebral in- 



farction revealed disorders of lipoprotein metabolism 
and essential hypertension^cerebral infarction by the 
Apriori algorithm, as well as an association of T2DM 
and essential hypertension^ cerebral infarction [5]. 

Patients with T2DM often have irregular diet pat- 
terns, which deleteriously influences glucose control, 
lipid metabolism, and micronutrient intake [13]. In ad- 
dition, T2DM is progressive and generally incurable, 
precluding several complications related to poor glucose 
regulation [14]. The use of medications to counteract the 
complications of diet and disease itself can cause and 
exacerbate gastric disorders. This was recently shown 
by the link between T2DM and gastroesophageal reflux 
disease, gastric ulcer, and gastritis and duodenitis. 

Fasting glucose and diabetes correlate with the occur- 
rence of cataracts, and metabolic disorders of the body 
increase the risk of the occurrence of cataracts. Spe- 
cifically, the risk of cataracts increases in low levels of 
high-density lipoprotein cholesterol, hypertension, and 
high fasting glucose [15]. The present data also support 
an association between T2DM and senile cataract and 
essential hypertension. However, an association with 
dyslipidemia was not found and this requires further 
study. 

Although the present study showed that T2DM is 
associated with heart failure and chronic renal fail- 
ure, other studies on T2DM did not show such results 
[2,11,14]. Park et al. [16] investigated the cause of death 
in 680 patients with T2DM and reported that cerebro- 
vascular disease (15.0%), ischemic heart disease (15.6%), 
infectious disease (25.3%), cancer (21.9%), congestive 
heart failure (7.1%), kidney disease (4.7%), and other 
diseases are major causes of death, which offers support 
for an association rule for T2DM, congestive heart fail- 
ure, and chronic kidney disease. 

In the present study, 7.21% (1,464 patients) of the pa- 
tients with T2DM displayed accompanying osteoporosis 
without pathological fracture, and the association rule 
of T2DM^osteoporosis without pathological fracture 
was generated. Patients with T2DM were found to have 
more concurrent osteologic diseases than nondiabetic 
patients, suggesting that patients with T2DM may have 
decreased bone density [17]. 

This study determined comorbidities using the as- 
sociation rules generated for the diagnosis data of 
patients with T2DM by applying ARM from previous 



http://dx.doi.org/10.3904/kjim.2012.27.2.197 



http://www.kjim.or.kr 



202 The Korean Journal of Internal Medicine Vol. 27, No. 2, June 2012 



studies. While the possibility exists that doctors added 
diagnoses excessively to increase prescriptions or that 
comorbidities were found but not recorded, the majority 
of cases were diagnosed accurately, and the few inac- 
curacies were filtered by using large amounts of clinical 
data. 

This study was significant because it was based on a 
large amount of data generated using electronic medi- 
cal records in clinical use, a constructed data mart, 
and analysis of the comorbidity of DM using a program 
that automates the determination of the Apriori algo- 
rithm. However, a limitation of the present study is that 
the data came from a single medical institution. Data 
from other medical facilities should be collected and 
analyzed to demonstrate the relevance of the program 
and its results. Furthermore, the Apriori algorithm is 
limited in determining precedence or causality of dis- 
ease. Therefore, future studies to identify the temporal 
complications of diseases considering chronology (e.g., 
the sequential pattern of disease occurrence) should be 
conducted. 
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